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Abstract 



Motivated by the success of a fc-clique percolation method for 
the identification of overlapping communities in large real networks, 
here we study the fc-clique percolation problem in the Erdos-Renyi 
graph. When the probability p of two nodes being connected is 
above a certain threshold p c (k), the complete subgraphs of size k 
(the fc-cliques) are organized into a giant cluster. By making some 
assumptions that are expected to be valid below the threshold, we 
determine the average size of the A;-clique percolation clusters, using a 
generating function formalism. From the divergence of this average 
size we then derive an analytic expression for the critical linking 
probability p c (k). 

1 Introduction 

Many complex systems in nature and society can be successfully repre- 
sented in terms of networks capturing the intricate web of connections 
among the units they are made of. Graphs corresponding to these real 
networks exhibit unexpected non-trivial properties, e.g., new kinds of de- 
gree distributions, anomalous diameter, spreading phenomena, clustering 
coefficient, and correlations El Q] In recent years, there has been 
a quickly growing interest in the local structural units of networks. Small 
and well defined subgraphs consisting of a few vertices have been intro- 
duced as "motifs" |S] ■ Their distribution and clustering properties [HI E| |S] 
can be considered as important global characteristics of real networks. 
Somewhat larger units, associated with more highly interconnected parts 
0[T3[nilIllI3IISllIllISlinilIlliniEniEIlare usually called clusters, 
communities, cohesive groups, or modules, with no widely accepted, unique 
definition. Such building blocks (functionally related proteins j52H2Sj, in- 
dustrial sectors [2], groups of people ^H2H], cooperative players [2l)ll2"7] . 
etc.) can play a crucial role in the structural and functional properties 
of the networks involved. The presence of communities is also a relevant 
and informative signature of the hierarchical nature of complex systems 

[221 EHl EH| - 

Most of the methods used for the identification of communities rely on 
dividing the network into smaller pieces. The biggest drawback of these 
methods is that they do not allow overlapping for the communities. On 
the other hand, the communities in a complex system are often not iso- 
lated from each other, but rather, they have overlaps, e.g., a protein can 
be part of more than one functional unit |3()| . and people can be mem- 
bers in different social groups at the same time [3T]. One possibility to 
overcome this problem is to use a community definition based on fc-clique 
percolation [321 133j . In this approach the communities are associated with 
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fc-clique percolation clusters, and can overlap with each other. The com- 
munities of large real networks obtained with this method were shown to 
have significant overlaps, and the statistical properties of the communities 
exhibited non-trivial universal features |33) . 

In this manuscript we focus on the basic properties of fc-clique perco- 
lation. In a recent work we have already proposed an expression for the 
critical point of the fc-clique percolation in the Erdos-Renyi (E-R) graph 
using simple heuristic arguments 32 . This expression has also been sup- 
ported by our numerical simulations. The goal of this manuscript is to 
make these result stronger by providing a more detailed analytical deriva- 
tion using only a few reasonable assumptions, expected to be valid below 
the critical point. We note that the critical point of fc-clique percolation 
plays a crucial role in the community finding as well. When dealing with 
a network containing weighted links, one can introduce a weight threshold 
and exclude links weaker than the threshold from the investigation. When 
the threshold is very high, only a few disintegrated community remains, 
whereas in case of a very low threshold, a giant community arises smearing 
out the details of the community structure by merging (and making invis- 
ible) many smaller communities. To find a community structure as highly 
structured as possible, one needs to set the threshold close to the critical 
point of the fc-clique percolation. 

2 /c-clique percolation in the E-R graph 

In the field of complex networks, the classical E-R uncorrelated random 
graph [33] serves both as a test bed for checking all sorts of new ideas con- 
cerning networks in general, and as a prototype of random graphs to which 
all other random graphs can be compared. One of the most conspicuous 
early result on the E-R graphs was related to the percolation transition 
taking place at p = p c = 1/N, where p is the probability that two vertices 
are connected by an edge and iV is the total number of vertices in the 
graph. The appearance of a giant component in a network, which is also 
referred to as the percolating component, results in a dramatic change in 
the overall topological features of the graph and has been in the center of 
interest for other networks as well. 

In this manuscript we address the general question of subgraph percola- 
tion in the E-R model. In particular, we provide an analytic expression for 
the threshold probability at which the percolation transition of complete 
subgraphs of size fc (the fc-cliques) takes place. Before proceeding we need 
to go through some basic definitions: 

• k- clique: a complete (fully connected) subgraph of fc vertices |35| . 
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• k-clique adjacency: two fc-cliques are adjacent if they share k — 1 
vertices, i.e., if they differ only in a single node. 

• k-clique chain: a subgraph, which is the union of a sequence of adja- 
cent fc-cliques. 

• k-clique connectedness: two fc-cliques are fc-clique-connected if they 
are parts of a fc-clique chain. 

• k-clique percolation cluster (or component): a maximal fc-clique-connected 
subgraph, i.e., it is the union of all fc-cliques that are fc-clique-connccted 
to a particular fc-clique. 

The above concept of fc-clique percolation is illustrated in Fig^ where 
both graphs contain two 3-clique percolation clusters, the smaller ones in 
dark gray and the larger ones in black. We note that these objects can be 
considered as interesting specific cases of the general graph theoretic objects 
defined by Everett and Borgatti [201 and by Batagelj and Zaversnik [37] in 
very different contexts. 



Figure 1: Sketches of two E-R graphs of N — 20 vertices and with edge 
probabilities p = 0.13 (left one) and p = 0.22 (right one, generated by adding 
more random edges to the left one). In both cases all the edges belong to a 
"giant" connected component, because the edge probabilities are much larger 
than the threshold (p c = 1/N = 0.05) for the classical E-R percolation tran- 
sition. However, in the left one p is below the 3-cliques percolation threshold, 
p c (3) « 0.16, calculated from Eq. JT5J), therefore, only a few scattered 3-cliques 
(triangles) and small 3-clique percolation clusters (distinguished by black and 
dark gray edges) can be observed. In the right one, on the other hand, p is 
above this threshold and, as a consequence, most 3-cliques accumulate in a 
"giant" 3-clique percolation cluster (black edges). This graph also illustrates 
the overlap (half black, half dark gray vertex) between two clusters (black and 
dark gray). 
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An illustration of the fc-clique percolation clusters can be given by "fc- 
clique template rolling . A fc-clique template can be thought of as an object 
that is isomorphic to a complete graph of k nodes. Such a template can be 
placed onto any fc-clique of the network, and rolled to an adjacent fc-clique 
by relocating one of its nodes and keeping its other fc — 1 nodes fixed. Thus, 
the fc-clique-communities of a graph are all those subgraphs that can be 
fully explored by rolling a fc-clique template in them but cannot be left by 
this template. We note that a fc-clique percolation cluster is very much like 
a regular edge percolation cluster in the k-clique adjacency graph, where the 
vertices represent the fc-cliques of the original graph, and there is an edge 
between two vertices if the corresponding fc-cliques are adjacent. Moving 
a particle from one vertex of this adjacency graph to another one along an 
edge is equivalent to rolling a fc-clique template from one fc-clique of the 
original graph to an adjacent one. 



3 The generating functions 

In our investigation of the critical point of the fc-clique percolation in the 
E-R graph we shall rely on the generating function formalism in a fashion 
similar to that of Ref . . Therefore, in this section we first summarize the 
definition and the most important properties of the generating functions. 
If a random variable £ can take non-negative integer values according to 
some probability distribution V((, = n) = p(n), then the corresponding 
generating function is given by 



(1) 



71=0 



The generating-function of a properly normalized distribution is absolute 
convergent for all \x\ < 1 and hence has no singularities in this region. For 
x = 1 it is simply 



G p (l) = £>(n) = 1. 



(2) 



71=0 



The original probability distribution and its moments can be obtained from 
the generating-function as 



p(n) 



_L_ d n G p (x) 
n\ dx n 

oo 



x=0 



71 = 



x- , G p (x) 



(3) 
(4) 
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And finally, if 77 = £1 + £2 + •■• + £,1, where £1,^2, •••>£; are independent 
random variables (with non- negative integer values), then the generating 
function corresponding to V(r\ = ri) = cr(n) is given by 

G a {x) = {x 1 ') = (x^x^ 2 ■■ -x il ) = (x^ 1 ) (x&) ■ --(x 6 ') = 

G Pl {x)G P2 {x)---G Pl (x). (5) 

4 The critical point 

In this section we arrive at the derivation of the critical point of the fc-clique 
percolation in the E-R graph in the N — > 00 limit. We begin by considering 
the probability distribution r(n) of the number of fc-cliques adjacent to a 
randomly selected fc-clique. Finding a fc-clique B adjacent to a selected 
fc-clique A is equivalent to finding a node outside A linked to at least k — 1 
nodes in A. The number of possibilities for this node is N — k. Links in the 
E-R graph are independent of each other, therefore the probability that a 
given node is linked to all nodes in A is p k , whereas the probability that it 
is linked to k — 1 nodes in A is fc(l —p)p k ~ 1 . Therefore, to leading order in 
N the average number of fc-cliques adjacent to a randomly selected one is 

(r) = (N-k) [fc(l - p)p k ' 1 + p k ] ~ Nkp k -\ (6) 

From the independence of the links it also follows that the probability 
distribution r(n) becomes Poissonean, which can be written as 

r(n) = exp (-Nkp^ 1 ) (1^3 , ( 7 ) 

n! 

Let us suppose that we are below the percolation threshold, therefore, 
fc-cliques are rare, adjacent fc-cliques are even more rare, and loops in the fc- 
clique adjacency graph are so rare that we can assume it to be tree-like 39 . 
In this case the size of a connected component in the fc-clique adjacency 
graph (corresponding to a fc-clique percolation cluster) can be evaluated 
by counting the number of fc-cliques reached in an "invasion" process as 
follows. We start at an arbitrary fc-clique in the component, and in the 
first step we invade all its neighbors in the fc-clique adjacency graph. From 
then on, whenever a fc-clique is reached, we proceed by invading all its 
neighbors, except for the one the fc-clique has been reached from, as shown 
in Fig^. In terms of the original graph, this is equivalent to rolling a 
fc-clique template to all adjacent fc-cliques except for the one we arrived 
from in the previous step. 

In the invasion process described above, we can assign to each fc-clique 
the subgraph in the fc-clique percolation cluster that was invaded from it. 
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Figure 2: Schematic picture of the evaluation of the size of a fc-clique per- 
colation cluster by counting the number of fc-cliques reached in an "invasion" 
process, a) Let us suppose that we arrived at the black colored fc-clique from 
the fc-clique marked by dashed lines. In the next step we must proceed to 
the fc-cliques shown in dark gray, and then finally to the fc-cliques marked in 
light gray, b) The corresponding fc-clique adjacency graph is shown on the 
right. The size of the connected component in the fc-clique adjacency graph 
we can invade from the black fc-clique (by excluding the link through which we 
initially reached it) is equal to one plus the sum of the sizes of the connected 
components invaded from the dark gray fc-cliques in the same way. 

(Note that we assumed the fc-clique adjacency graph to be tree-like). Let us 
denote by I(n) the probability, that the subgraph reached from an arbitrary 
starting fc-clique in the invasion process contains n number of fc-cliqucs, 
including the starting fc-clique as well. This subgraph is actually equal to 
a fc-clique percolation cluster. Similarly, let H(n) denote the probability 
that the subgraph invaded from a fc-clique appearing later in the invasion 
process (i.e., from a fc-clique that is not the starting one) contains n number 
of fc-cliques. This is equivalent to the probability that by starting at a 
randomly selected fc-clique and trying to roll a fc-clique template via all 
possible subsets of size fc — 1 except for one, then by succedingly rolling the 
template on and on, in all possible directions without turning back, a fc- 
clique percolation "branch" of size n is invaded. And finally, let H m (n) be 
the probability, that if pick m number of fc-cliques randomly, then the sum 
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of the sizes of the fc-clique branches that we can invade in this way consists 
of n number of fc-cliques. Since we are below the percolation threshold, the 
fc-clique adjacency graph consists of many dispersed components of small 
size, and the probability that two (or more) fc-cliques out of m belong to the 
same fc-clique percolation cluster is negligible. Hence, according to Eq.JSJ), 
the generating functions corresponding to H(n) and H m (n), denoted by 
Gh(x) and Gjj m (x) respectively are related to each other as: 



G Hm (x) = [G H (x)} 7 



(8) 



Let q(n) denote the probability, that for a randomly selected fc-clique, 
by excluding one of its possible subsets of size k — 1, we can roll a fc- 
clique template through the remaining subsets to n adjacent fc-cliques. 
This distribution is very similar to r(n), except that in this case we can use 
only fc — 1 subsets instead of fc in the fc-clique to roll the fc-clique template 
further, therefore 



q(n) = exp {-N(k l)/" 1 ) l)^" 1 )' 



(9) 



By neglecting the loops in the fc-clique adjacency graph, H n can be ex- 
pressed as 



H(n) = q(0)H Q (n - 1) + g(l)ifi(n - 1) + q(2)H 2 (n - 1) + 



(10) 



as explained in Fig|2h- By taking the generating function of both sides and 
using Eqs.J3J) and JSJl, we obtain 



G H {x) 



E 

oo 

E 



^ q(m)H m (n - 1) 



.m=0 



n=0 



E q ( m h 



-[Gh(x)Y 



.m=0 



(n — 1)! dx 
= ^2 l( m ) [GH{x)] m x = xG q G H {x) 



x=0 



(11) 



where G q {x) denotes the generating function of the distribution q(n). 

We can write an equation similar to Eq.^UJ for I(n) as well, in the 
form of 

J(n) = r(0)H o (n - 1) + r(l)ifi(n - 1) + r[2)H 2 (n - 1) + . . . (12) 
Again, by taking the generating functions of both sides we arrive at 

Gi(x)=xG r (G„(x)), (13) 
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where G r (x) denotes the generating function of r(n). From Eqs.QJ and 
(T3) we get 

(/) = G'j(l) = G r (G H (l)) + G' r (G H (l))G' H (l) = 1 + G' r (l)G' H (l) (14) 

for the average size of the components invaded from a randomly selected 
fc-clique. Using Ea. 1)11(1 we can write 

G' H {\) = G q (G H (l)) + G' q (G H (l))G' H (l) = l + G' q (l)G' H (l), (15) 

from which G' H (1) can be expressed as 

^(1) = YZ ± m - (16) 

By substituting this back into Eq. l|14|) we get 

(j ) = 1+ m =1 + JrL- ( i7) 

The above expression for the expected size of the connected components 
in the fc-clique adjacency graph invaded from a randomly selected fc-clique 
diverges when 

(q) = N(k - I)?* -1 = 1. (18) 

This point marks the phase transition at which a giant component (corre- 
sponding to a giant fc-clique percolation cluster) first appears. Therefore, 
our final result for the critical linking probability for the appearance of the 
giant component is 

Pc(k) = i— (19) 

[iV(fc-l)]^T V ' 

This result reassures the findings of [35] based on heuristic arguments and 
numerical simulations. Obviously, for k = 2 our result agrees with the 
known percolation threshold (p c = l/N) for E-R graphs, because 2-cliquc 
connectedness is equivalent to regular (edge) connectedness. 



5 Conclusions 

The phenomenon of fc-clique percolation provides an effective tool for find- 
ing overlapping communities in large networks. In this article we derived 
the critical linking probability for the E-R graph in the N — > oo limit. 
Our method involved the use of generating functions and was based on the 
assumption that up to the critical point, loops in the fc-clique adjacency 
graph are negligible. Our findings are in complete agreement with earlier 
results based on heuristic arguments and numerical simulations. 

This work has been supported in part by the Hungarian Science Foun- 
dation (OTKA), grant Nos. F047203 and T049674. 
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